Goto

Collaborating Authors

 intrinsic reward


Successor-Predecessor Intrinsic Exploration Changmin Y u 1,2 Neil Burgess

Neural Information Processing Systems

Exploration is essential in reinforcement learning, particularly in environments where external rewards are sparse. Here we focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards.


A Algorithms

Neural Information Processing Systems

We directly adopt the official default setting for Atari games. B.2 Minecraft Environment Settings Table 1 outlines how we set up and initialize the environment for each harvest task. Our method is tested in two different biomes: plains and sunflower plains. Both the plains and sunflower plains offer a wider field of view. In Minecraft, the action space is an 8-dimensional multi-discrete space.







1 Supplement 1.1 Model Architectures Figure 1: Model Architectures for Latent Integration

Neural Information Processing Systems

Farmworld is an open-ended gridworld environment designed with two goals in mind: high customiz-ability and support for diverse solutions. Maps can be hand-crafted, or randomly generated. RGB images are used, agents'see' exactly what we see: units visibly lose health by damage patterns Agents have partially-observable observations: they do not see the entire map. Reward Agents get an individual reward of 0.1 for each timestep that they are alive. To this end, we augmented DIA YN and called this DIA YN*. We subtract by the batch mean so that on average, the expected agent reward equals only what is provided by the extrinsic environment.


Successor-Predecessor Intrinsic Exploration

Neural Information Processing Systems

Exploration is essential in reinforcement learning, particularly in environments where external rewards are sparse. Here we focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. Although the study of intrinsic rewards has a long history, existing methods focus on composing the intrinsic reward based on measures of future prospects of states, ignoring the information contained in the retrospective structure of transition sequences. Here we argue that the agent can utilise retrospective information to generate explorative behaviour with structure-awareness, facilitating efficient exploration based on global instead of local information. We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information. We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods. We also implement SPIE in deep reinforcement learning agents, and show that the resulting agent achieves stronger empirical performance than existing methods on sparse-reward Atari games.


Regularity as Intrinsic Reward for Free Play

Neural Information Processing Systems

We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model's epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks.